home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Turnbull China Bikeride
/
Turnbull China Bikeride - Disc 2.iso
/
STUTTGART
/
TEMP
/
GNU
/
bison
/
MysteryCon
< prev
next >
Wrap
Text File
|
1995-06-28
|
3KB
|
106 lines
Mystery Conflicts
Previous: <Reduce\/Reduce=>ReduceRedv> * Next: <Stack Overflow=>StackOverf> * Up: <Algorithm=>Algorithm>
#Wrap on
{fH3}Mysterious Reduce\/Reduce Conflicts{f}
Sometimes reduce\/reduce conflicts can occur that don't look warranted.
Here is an example:
#Wrap off
#fCode
%token ID
%%
def: param\_spec return\_spec ','
;
param\_spec:
type
| name\_list ':' type
;
return\_spec:
type
| name ':' type
;
type: ID
;
name: ID
;
name\_list:
name
| name ',' name\_list
;
#f
#Wrap on
It would seem that this grammar can be parsed with only a single token
of look-ahead: when a {fCode}param\_spec{f} is being read, an {fCode}ID{f} is
a {fCode}name{f} if a comma or colon follows, or a {fCode}type{f} if another
{fCode}ID{f} follows. In other words, this grammar is LR(1).
However, Bison, like most parser generators, cannot actually handle all
LR(1) grammars. In this grammar, two contexts, that after an {fCode}ID{f}
at the beginning of a {fCode}param\_spec{f} and likewise at the beginning of
a {fCode}return\_spec{f}, are similar enough that Bison assumes they are the
same. They appear similar because the same set of rules would be
active---the rule for reducing to a {fCode}name{f} and that for reducing to
a {fCode}type{f}. Bison is unable to determine at that stage of processing
that the rules would require different look-ahead tokens in the two
contexts, so it makes a single parser state for them both. Combining
the two contexts causes a conflict later. In parser terminology, this
occurrence means that the grammar is not LALR(1).
In general, it is better to fix deficiencies than to document them. But
this particular deficiency is intrinsically hard to fix; parser
generators that can handle LR(1) grammars are hard to write and tend to
produce parsers that are very large. In practice, Bison is more useful
as it is now.
When the problem arises, you can often fix it by identifying the two
parser states that are being confused, and adding something to make them
look distinct. In the above example, adding one rule to
{fCode}return\_spec{f} as follows makes the problem go away:
#Wrap off
#fCode
%token BOGUS
…
%%
…
return\_spec:
type
| name ':' type
\/\* This rule is never used. \*\/
| ID BOGUS
;
#f
#Wrap on
This corrects the problem because it introduces the possibility of an
additional active rule in the context after the {fCode}ID{f} at the beginning of
{fCode}return\_spec{f}. This rule is not active in the corresponding context
in a {fCode}param\_spec{f}, so the two contexts receive distinct parser states.
As long as the token {fCode}BOGUS{f} is never generated by {fCode}yylex{f},
the added rule cannot alter the way actual input is parsed.
In this particular example, there is another way to solve the problem:
rewrite the rule for {fCode}return\_spec{f} to use {fCode}ID{f} directly
instead of via {fCode}name{f}. This also causes the two confusing
contexts to have different sets of active rules, because the one for
{fCode}return\_spec{f} activates the altered rule for {fCode}return\_spec{f}
rather than the one for {fCode}name{f}.
#Wrap off
#fCode
param\_spec:
type
| name\_list ':' type
;
return\_spec:
type
| ID ':' type
;
#f
#Wrap on